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—— Abstract 

Nondeterministic automata may be viewed as succinct programs implementing deterministic auto- 
mata, i.e. complete specifications. Converting a given deterministic automaton into a small non- 
deterministic one is known to be computationally very hard; in fact, the ensuing decision problem is 
PSPACE-complete. This paper stands in stark contrast to the status quo. We restrict attention to 
subatomic nondeterministic automata, whose individual states accept unions of syntactic congruence 
classes. They are general enough to cover almost all structural results concerning nondeterministic 
state-minimality. We prove that converting a monoid recognizing a regular language into a small 
subatomic acceptor corresponds to an NP-complete problem. The NP certificates are solutions of 
simple equations involving relations over the syntactic monoid. We also consider the subclass of 
atomic nondeterministic automata introduced by Brzozowski and Tamm. Given a deterministic 
automaton and another one for the reversed language, computing small atomic acceptors is shown to 
be NP-complete with analogous certificates. Our complexity results emerge from an algebraic char- 
acterization of (sub)atomic acceptors in terms of deterministic automata with semilattice structure, 
combined with an equivalence of categories leading to succinct representations. 
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M Introduction 


Regular languages arise from a multitude of different perspectives: operationally via finite- 
state machines, model-theoretically via monadic second-order logic, and algebraically via 
finite monoids. In practice, deterministic finite automata (dfas) and nondeterministic finite 
automata (nfas) are two of the most common representations. Although the former may be 
exponentially larger than the latter, there is no known efficient procedure for converting dfas 
into small nfas, e.g. state-minimal ones. Jiang and Ravikumar proved the corresponding 
decision problem (does an equivalent nfa with a given number of states exist?) to be PSPACE- 
complete [14,15], suggesting that exhaustively enumerating candidates is necessary. One 
possible strategy towards tractability is to restrict the target automata to suitable subclasses 
of nfas. The challenge is to identify subclasses permitting more efficient computation (e.g. 
lowering the PSPACE bound to an NP bound, enabling the use of SAT solvers), while still 
being general enough to cover succinct acceptors of regular languages. 

In our present paper we will show that the class of subatomic nfas naturally meets the above 
requirements. An nfa accepting the language L is subatomic if each individual state accepts 
a union of syntactic congruence classes of L. In recent work [26] we observed that almost 
all known results on the structure of small nfas, e.g. for unary [6, 13], bideterministic [30], 
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topological [1] and biRFSA languages [19], implicitly construct small subatomic nfas. This 
firmly indicates that the latter form a rich class of acceptors despite their seemingly restrictive 
definition, i.e. in many settings computing small nfas amounts to computing small subatomic 
ones. Restricting to subatomic nfas yields useful additional structure; in fact, their theory is 
tightly linked to the algebraic theory of regular languages and the representation theory of 
monoids. This suggests an algebraic counterpart of the dfa to nfa conversion problem: given 
a finite monoid recognizing some regular language, compute an equivalent small subatomic 
nfa. Denoting its decision version (does an equivalent subatomic nfa with a given number of 
states exist?) by MON — NFAgyn, our main result is: 


> Theorem. The problem MON — NFAsyn is NP-complete. 


In addition we also investigate atomic nfas, a subclass of subatomic nfas earlier introduced by 
Brzozowski and Tamm [4]. Similar to the subatomic case, their specific structure naturally 
invokes the problem of converting a pair of dfas accepting mutually reversed languages into 
a small atomic nfa. Denoting its decision version by DFA + DFA‘ > NFA atm, we get: 


> Theorem. The problem DFA + DFA‘ + NFA.atm is NP-complete. 


The short certificates witnessing that both problems are in NP are solutions of equations 
involving relations over the syntactic congruence or the Nerode left congruence, respectively. 
The above two theorems sharply contrast the PSPACE-completeness of the general dfa to 
nfa conversion problem, but also previous results on its sub-PSPACE variants. The latter are 
either concerned with particular regular languages such as finite or unary ones [11,13], or 
with target nfas admitting only very weak forms of nondeterminism, such as unambiguous 
automata [15] or dfas with multiple initial states [22]. In contrast, our present work applies 
to all regular languages and the restriction to (sub)atomic nfas is a purely semantic one. 
Our results are fundamentally based upon a category-theoretic perspective on atomic 
and subatomic acceptors. At its heart are two equivalences of categories as indicated below: 


~ 


JSLP ~> > JSL; < 
Structure theory Complexity theory 


> Dep. 


As shown in [26], the structure theory of (sub)atomic nfas emerges by interpreting them as 
dfas endowed with semilattice structure, and relating them to their dual automata under 
the familiar self-duality of the category JSL¢ of finite semilattices. Similarly, the complexity 
theory of (sub)atomic nfas developed in the present paper rests on the equivalence between 
JSL¢ and a category Dep (see Definition 3.1) that yields succinct relational representations 
of finite semilattices by their irreducible elements. To derive the NP-completeness theorems, 
we reinterpret semilattice automata associated to (sub)atomic nfas inside Dep. We regard 
this conceptually simple and natural categorical approach as a key contribution of our paper. 


[J Atomic and Subatomic NFAs 


We start by setting up the notation and terminology used in the rest of the paper, including 
the key concept of a (sub)atomic nfa that underlies our complexity results. Readers are 
assumed to be familiar with basic category [21]. 


Semilattices. A (join-)semilattice is a poset (S, <s) in which every finite subset X C S has 
a least upper bound (a.k.a. join) VX. A morphism between semilattices is a map preserving 
finite joins. If S is finite as we often assume, every subset X C S also has a greatest lower 
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bound (a.k.a. meet) A X, given by the join of its lower bounds. In particular, S has a least 
element Ls = V} and a greatest element Ts = A. An element j € S is join-irreducible 
if j = V X implies j € X for every subset X C S. Dually, m € S is meet-irreducible if 
m = A X implies m € X. We put 


J(S)={j€S : jis join-irreducible} and M(S)={m €S : mis meet-irreducible }. 


Note Ls ¢ J(S) and Ts ¢ M(S). The join-irreducibles form the least set of join-generators 
of S, i.e. every element of S is a join of elements from J(S), and every other subset J C S 
with that property contains J(S). Dually, M(S) is the least set of meet-generators of S. 

Let 2 = {0,1} be the two-element semilattice with 0 < 1. Morphisms i: 2 > S$ 
correspond to elements of S via i++ i(1). Morphisms f: S — 2 correspond to prime filters 
via f + f—1[1]. If S is finite, these are precisely the subsets F,, = {s € S : s £s so} for any 
So E S. 

We denote by JSL the category of join-semilattices and their morphisms. Its full 
subcategory JSL¢ of finite semilattices is self-dual [17]: there is an equivalence functor 


JSL? — ISL 


mapping (5,<g) to the opposite semilattice S°? = (S, >s) obtained by reversing the order, 
and a morphism f: S —> T to the morphism f,: T°? — S°P sending t € T to the <g-greatest 
element s € S with f(s) <r t. Thus, f and f, satisfy the adjoint relationship 


f(s)<rt iff sg f(t) 


for alls € Sand t € T. The morphism f is injective (equivalently a JSL¢-monomorphism) 
iff fx is surjective (equivalently a JSL¢-epimorphism). 


Relations. A relation between sets X and Y is a subset R C X x Y. We write R(z, y) if 
(x,y) E€ R. For x € X and AC X we put 


Ria] ={y EY :R(z,y)} and R[A] = U Ra]. 
zrEA 


The converse of R is the relation R C Y x X (alternatively R“) where R(y, x) if R(x, y) for 
xz € X andy €Y. The composite fR C X xY and S CY x Z is the relation R;S C X xZ 
where R(x, z) iff there exists y € Y with R(x, y) and S(y, z). Let Rel denote the category 
whose objects are sets and whose morphisms are relations with the above composition. The 
identity morphism on X is the identity relation idx C X x X with idx (a, y) if z = y. 

A biclique of a relation R C X x Y is subset of the form Bı x By C R, where By C X 
and Bə CY. A set C of bicliques forms a biclique cover if R =|]JC. The bipartite dimension 
of R, denoted dim(R), is the minimum cardinality of any biclique cover. 


Languages. Let &* be the set of finite words over an alphabet © including the empty 
word £. A language is a subset L of ©*. We let L = * \ L denote the complement and 
L" = {w': w € L} the reverse of L, where €" = £ and w" =a,,...a1 for w = a1... an. The 
left derivatives and two-sided derivatives of L are, respectively, given by u-!L = {w € X* : 
uw € L} and u-!Luv-! = {w € &* : uwv € L} for u,v € &*; moreover for U C * put 
U'L=U,eyu 'L. For each fixed L C &*, the following sets of languages will play a 
prominent role: 


LD(L) € SLD(Z) € BLD(L) C BLRD(Z) 
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where LD(L) = {u-'L: u € d*} is the set of left derivatives, and SLD(L), BLD(L), BLRD(L) 
denote its closure under finite unions, all set-theoretic boolean operations, and all set- 
theoretic boolean operations and two-sided derivatives, respectively. The final three form 
U-semilattices, and the final two are boolean algebras w.r.t. the set-theoretic operations. 


A language L is regular if LD(L) is a finite set; then the other three sets are finite too. 
The finite semilattices SLD(L) and SLD(L") are related by the fundamental isomorphism 


drz: [SLD(L")] Š SLD(L), K+ (KO'L, (2.1) 


see [26, Proposition 3.13]. Equivalently, the map drz sends V~'L' € SLD(L") to the largest 
element of SLD(Z) disjoint from V". It is closely connected to the dependency relation of L, 


DR, C LD(L) x LD(L'), DR (uw 'L,v tL’) 4 uwv eL foruve dD*. (2.2) 
In fact, by [26, Theorem 3.15] we have 
DRy(u L, vL") 4 uL g drev")  foruve =". (2.3) 


Since the boolean algebra BLD(L) is generated by the left derivatives of L, its atoms (= 
join-irreducibles) are the congruence classes of the Nerode left congruence ~g C O* x X*, 


uxu if Veed*:uer ‘Lever 'L if (u) = (v). (2.4) 


Note that this relation is left-invariant, i.e. u ~z v implies wu ~z wv for all w € &*. 
Similarly, the atoms of BLRD(L) are the congruence classes of the syntactic congruence 
=, C &* x &*, i.e. the monoid congruence on the free monoid &* defined by 


u=,v iff Yg, ,y € ©*:u €x lLy s v egr 'hy!. (2.5) 


The quotient monoid syn(L) = X* /=z is called the syntactic monoid of L, and the canonical 
map ug: 4* —> syn(L) sending u € b* to its congruence class [u]=, is the syntactic morphism. 


Automata. Fix a finite alphabet ©. A nondeterministic finite automaton (a.k.a. nfa) 
N = (Q,ô, I, F) consists of a finite set Q (the states), relations ô = (ôa C Q x Q)aex (the 
transitions), and sets 7, F C Q (the initial states and final states). We write q1 => q2 whenever 
q2 € 6alqi]. The language L(N,q) accepted by a state q E€ Q consists of all words w € X* 
such that ôwlq| A F 4 0, where fw C Q x Q is the extended transition relation 6,,;..-3 5a, 
for w = a1 .. . an and ĝe =idg. The language accepted by N is defined L(N) = Uje, L(N, ô). 

An nfa N is a deterministic finite automaton (a.k.a. dfa) if I = {qo} is a singleton set 
and each transition relation is a function ĝa: Q —> Q. A dfa is a JSL-dfa if Q is a finite 
semilattice, each ĝa: Q > Q is a semilattice morphism, and F C Q forms a prime filter. It is 
often useful to represent a JSL-dfa in terms of morphisms 


at o2s049 


where i is the unique morphism with i(1) = qo and f is given by f(q) = 1 if q € F. A JSL- 
dfa morphism from A = (Q,6,i, f) to A’ = (Q’,6’,7, f’) is a JSL¢-morphism h: Q > Q 
preserving transitions via h o 6, = ô, o h, preserving the initial state via 7’ = hoi, and both 
preserving and reflecting the final states via f = f'o h. Equivalently, h is a dfa morphism 
that is also a semilattice morphism, so in particular L(A) = L(A’). If Q is a subsemilattice 
of Q' and h: Q = Q' is the inclusion map, then A is called a sub JSL-dfa of A’. 
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Fix a regular language L. Viewed as a U-semilattice, BLRD(L) carries the structure of a 
JSL-dfa with transitions K “+ a~'K, initial state L, and finals {K : € € K}. This restricts 
to sub JSL-dfas BLD(Z) and SLD(Z). Moreover LD(Z) forms a sub-dfa of SLD(Z), well- 
known [5] to be the state-minimal dfa for L, so we denote it by dfa(L). The syntactic monoid 
syn(L) is isomorphic to the transition monoid of dfa(L), i.e. the monoid of all extended 
transition maps ôw: LD(Z) > LD(L) (w € 5*) with multiplication given by composition [27]. 


Analogously SLD(L) is the state-minimal JSL-dfa for L. Up to isomorphism, it is the 
unique JSL-dfa for L that is JSL-reachable (i.e. every state is a join of states reachable from 
the initial state via transitions) and simple (i.e. distinct states accept distinct languages). 


Nfas, dfas and JSL-dfas are expressively equivalent and accept precisely the regular 
languages. In particular, to every JSL-dfa A = (Q, ô, qo, F) one can associate an equivalent 
nfa J(A), the nfa of join-irreducibles [1,2,25]. Its states are given by the set J(Q) of 
join-irreducibles of Q; for any qi,q2 E€ J(Q) and a € E there is a transition qı S q2 in J(A) 
iff q2 <Q alqı); a state q € J(Q) is initial iff q <s qo, and final iff q € F. For any q € J(Q), 
we have L(A,q) = L(J(A),q). The canonical residual finite state automaton [7] for a regular 
language L is given by Nz = J(SLD(Z)), the nfa of join-irreducibles of its minimal JSL-dfa. 


Atomic and subatomic nfas. An nfa accepting the language L C &* is called atomic [4] 
if each state accepts a language from BLD(L), and subatomic [26] if each state accepts a 
language from BLRD(L). The nondeterministic atomic complexity natm(L) of a regular 
language L is the least number of states of any atomic nfa accepting L. The nondeterministic 
syntactic complexity nsyn(L) is the least number of states of any subatomic nfa accepting 
L. Subatomic nfas are intimately connected to syntactic monoids: the atoms of BLRD(L) 
are the elements of syn(L), so an nfa accepting L is subatomic iff its individual states 
accept unions of syntactic congruence classes. Additionally nsyn(Z) can be characterized via 
boolean representations of syn(L), i.e. monoid morphisms g: syn(L) > JSL¢(5,S) into the 
endomorphisms of a finite semilattice [26]. For a detailed exposition we refer to op. cit. 


These complexity measures are related to the nondeterministic state complexity ns(L), 
i.e. the least number of states of any (unrestricted) nfa accepting L. In particular, 


dim(DRz) < ns(L) < nsyn(L) < natm(Z). (2.6) 


The first inequality is due to Gruber and Holzer [10] (see also [26, Theorem 4.8] for a purely 
algebraic proof), while the others arise by restricting admissible nondeterministic acceptors. 


Importantly, small atomic and subatomic nfas can be characterized in terms of JSL-dfas. 
The following theorem involves two commuting diagrams of semilattice morphisms, whose 
lower and upper paths are the canonical JSL-dfas described earlier. 


> Theorem 2.1. Let L C b* be a regular language. 


1. natm(ZL) is the least number k such that there exists a finite semilattice S with |J(S)| < k 
and JSL¢-morphisms p,q and Ta (a € ©) making the left-hand diagram below commute. 


2. nsyn(L) is the least number k such that there exists a finite semilattice S with |J(S)| < k 
and JSL¢-morphisms p,q and Ta (a E€ E) making the right-hand diagram below commute. 
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ô! g” 
BLD(L) —* BLD(L) BLRD(L) —“-+ BLRD(Z) 
Pa + A ~~ + * fl 
4| 14 4| 14 
| m | | "i | 


att 
+ T + + 
pl | p P| |p 
i f i f 
S 


SLD(L) - = > SLD(É 


Proof. We only prove part (1), the proof of (2) being completely analogous. 


Suppose there exists a finite semilattice S with |J(S)| = k and JSL¢-morphisms p,q and 
(Ta)aexy Making the left diagram commute. Then A = (S,T,p o i, f'o q) is a JSL-dfa and 
p: SLD(L) > A and q: A > BLD(ZL) are JSL-dfa morphisms. Since JSL-dfa morphisms 
preserve the accepted language, and every state K € BLD(ZL) accepts the language K, it 
follows that A accepts L and every state of A accepts a language from BLD(L). Thus the 
nfa J(A) of join-irreducibles corresponding to A is an atomic nfa for L with k states. 


Conversely, assume N = (Q,ô,I,F) is a k-state atomic nfa accepting L. Form the U- 
semilattice S = langs( N) of all languages L(N, X) accepted by subsets X C Q. Note that 
SLD(L) C S C BLD(ZL): the first inclusion holds because u~'L = L(N, 6w[I]) € S for every 
u € &*, and the second one because N is atomic. We define the semilattice endomorphisms 


Ta: SOS by Tal K) =a 'K for K € S, 


Letting p: SLD(L) — S and q: S — BLD(L) denote the inclusions, the left diagram 
commutes. Moreover |J(S)| < k since S is join-generated by the elements L(N,q) for 
qEQ. < 


[37 Representing Finite Semilattices as Finite Relations 


We have seen that atomic and subatomic nfas amount to certain dfas with semilattice structure. 
To obtain our NP-completeness results concerning the computation of small (sub)atomic 
acceptors we will study succinct representations of the corresponding JSL¢-diagrams from 
Theorem 2.1. For this purpose, we start with the following key observation: 


Any finite semilattice S is completely determined by its poset of irreducibles [23], i.e. 
the relation £5 C J(S) x M(S) between join-irreducibles and meet-irreducibles. 


We now prove that this extends to an equivalence between the category JSL¢ of finite 
semilattices and another category called Dep. Its objects are the relations between finite sets 
and its morphisms represent semilattice morphisms as relations. The equivalence is inspired 
by Moshier’s categories of contexts [16,24] and will serve as the conceptual basis of our work. 


> Definition 3.1 (The category of dependency relations). The objects of the category Dep 
are the relations R C Rs x R between finite sets. Far less obviously, 


a morphism P: R > S is a relation P C Rs x S that factorizes through R and S, i.e. 
the left Rel-diagram below commutes for some P; C Rs x S; and Pu C St X Re. 


The identity morphism for R is idr = R, see the central diagram below. The composite 
P3O:R>T of P: R > Sand Q: S > T is any of the five equivalent relational compositions 
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starting from the bottom left corner and ending at the top right corner of the rightmost 
diagram below; that is, P 3 Q := Pi; Q T = Pi; Q = Pi; S; Q, = P; Qi = R; Pi Q (Note 
that we use the symbol 3 for composition in Dep and ; for composition in Rel, and recall 
that (—) denotes the converse relation.) 


P id” Pu at 
Rs--s- - +S, Rs—--—-—-—- Rs Rs >S; >T 
Pi id Pi Qı 


One readily verifies that Dep is a well-defined category; in particular, the composition is 
independent of the choice of the lower and upper witnesses (—); and (—)u. 


> Remark 3.2. 

1. Using the converse upper witness may seem strange. Although technically unnecessary, 
it fits the self-duality of Dep taking the converse on objects and morphisms. Moreover 
fi hr = £s; fi for any JSL¢-morphism f: S — T via the adjoint relationship; that is, f 
induces a Dep-morphism from £s to £7 with lower witness f and upper witness fẹ. 

2. The witnesses of a Dep-morphism P: R — S are closed under unions. The maximal 
lower witness PL C Rs x S, is given by 


P_(a,y) <== Sly] € Pla] for x ERs, yESs, 
and the maximal upper witness P} C St x Ry by 
P (y, x) :<— Ria] C Piy] for £ ERr, yY ES 


> Theorem 3.3 (Fundamental equivalence). The categories JSLs and Dep are equivalent. 


1. The equivalence functor Pirr: JSLg — Dep maps a finite semilattice S to the Dep-object 
Pirr(S) := s C J(S) x M(S), 
and a JSL¢g-morphism f: S —> T to the Dep-morphism 
Pirr(f): Pirr(S) > Pirr(T), Pirr(f)(j,m) : fj) rm forje J(S), me M(T). 
2. The inverse Open: Dep > JSLr maps a Dep-object R to its semilattice of open sets 
Open(R) := ({R[X]:X CRs}, ©), 
and a Dep-morphism P: R —> S to the JSL¢-morphism 
Open(P): Open(R) —> Open(S), Open(P)(O) := Px [O] for O € Open(R), 
where P+ C St x Ry is the maximal upper witness of P. 


> Remark 3.4. In the definition of Pirr(S) one may replace J(S) and M(S) by any two 
sets J, M C S of join- and meet-generators modulo Dep-isomorphism. Indeed, since the 
equivalence functor Open reflects isomorphisms, this follows immediately from the JSL¢- 
isomorphism Open(<5 N J x M) = Open(<s5 N J(S) x M(S)) given by OF ON M(S). 


> Remark 3.5. Bijectively relabeling the domain and codomain of a relation defines a 
Dep-isomorphism, the witnesses being the relabelings. 
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We now show that for every regular language L, the semilattices SLD(L), BLD(Z) and 
BLRD(L) equipped with their canonical JSL-dfa structure (see Section 2) translate under 
the equivalence functor Pirr into familiar concepts from automata theory. The translations 
are summarized in Table 1 and explained in Examples 3.6-3.8 below. 


E Table 1 Canonical JSL-dfas and their corresponding Dep-structures. 


JSLe Dep 
2 +, SLD(L) 2% SLD(L) $ 2 idi SOR, — Dees ty 
ce 
__ 2 > BLD(L) —> BLD(L) > 2 ida > idee pe, —> ids+/~;, > idı 
2“, BLRD(L) “2+ BLRD(L) £5 2 ide Z ideat) Z5 idyaeey Z idi 


> Example 3.6 (State-minimal JSL-dfa vs. dependency relation DRz). Let us start with 
the observation that SLD(L) is join-generated by LD(L) and meet-generated by dr;[{LD(L")]. 
The latter follows via the fundamental isomorphism (2.1). Then 


Pirr(SLD(L))(u~!L,drz(v'L')) 22 wn g drew L) È DRL, vL") 


for every u-'L € J(SLD(L)) and v™tL" € J(SLD(L")). Thus, 
Pirr(SLD(L)) is a bijective relabeling of DR z restricted to J(SLD(L)) x J(SLD(L")). 


By Remark 3.4 we know Pirr(SLD(L)) is isomorphic to the domain-codomain extension 
¢ C LD(L) x drz[LD(L")] and thus also to the dependency relation DR, by Remark 
3.5. Then the JSL-dfa structure of the semilattice SLD(L) translates into the category of 
dependency relations as shown in Table 1, where idı is the identity relation on 1 = {*} and 


TC1xLD(L'), DPRr a E LD(L) x LD(L'), F CID) Set. 
T(x, vL") eveL, DRra(w ily tlh) & uav E L, Flu tL,«x)eue L. 


> Example 3.7 (BLD(L) vs. the Nerode left congruence ~z). In Section 2 we observed that 
the atoms of the boolean algebra BLD(L) are the congruence classes of the Nerode left 
congruence. Then the co-atoms are their relative complements, and 


Pirr(BLD(L))([ulw.[~z) > les Z Mor > lulo = [elm 


By Remark 3.5, we see that BLD(L) corresponds to the Dep-object idy+/.,, and its JSL-dfa 

structure translates into the category of dependency relations as indicated in Table 1, where 
TC1Lx E*N, Di Cu /~p x E*N, F'C™M*/~z x 1, 
F'(«,[ul.,) uel, Dilni lel.) S lel, E atn Flui *) S une. 


We note that the above relations induce an nfa 
(©*/~r, (Dl Jaces: T'[*], F’[*]) known as the átomaton for the language L [4]. 


> Example 3.8 (BLRD(L) vs. the syntactic monoid syn(L)). Analogously, the boolean algebra 
BLRD(L) corresponds to the Dep-object id.y,;z). Its semilattice dfa structure translates into 
the category of dependency relations as shown in Table 1, where 


T” C 1 x syn(L), D? C syn(L) x syn(L), F” C syn(L) x 1, 
I"(x,[ule,) Su EL, Dl (ule, lol=:) © ble, E alla, F"(ul|,,9) S u Er e. 
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We conclude this section with two lemmas establishing important properties of the 
equivalence. The first concerns the bipartite dimension of relations (see Section 2): 


> Lemma 3.9. Let R be a relation between finite sets. 

1. dim(R) is the least |J(S)| of any injective JSL¢-morphism m: Open(R) — S. 

2. dim(R) is invariant under isomorphism, i.e. R = S in Dep implies dim(R) = dim(S). 

The second explicitly describes the join- and meet-irreducibles of the semilattice Open(R). 

> Notation 3.10. For R C Rs x R we define the following operator on the power set of Ra: 
ing: P(Rr) +P(Re),  Y ++ L{R[X] : X CRs and R[X] € Y}. 

Thus, inr(Y) is the largest open set of R contained in Y C R4. 


> Lemma 3.11. Let R C Rs x Ri be a relation between finite sets. 

1. J(Open(R)) consists of all sets R[x] (x € Rs) that cannot be expressed as a union of 
smaller such sets, i.e. Rx] = Uje, R[z:] implies R[x] = R|zx:] for some i € I. 

2. M(Open(R)) consists of all sets ing (R+ \ {y}) such that R[y] lies in J(Open(R)). 


4 Nuclear Languages and Lattice Languages 


As a further technical tool, we now introduce two classes of regular languages. They are 
well-behaved w.r.t. their small nfas and will emerge at the heart of our NP-completeness 
proofs in Section 5. Their definition rests on the notion of a nuclear morphism in JSLg, 
originating from the theory of symmetric monoidal closed categories [12,28]. Recall that a 
finite semilattice is a distributive lattice if x A (y V z) = (x A y) V (x A z) for all elements 
L,Y, 2. 


> Definition 4.1 (Nuclear language). A JSL¢-morphism f: S — T is nuclear if it factorizes 
through a finite distributive lattice, ie. f = (S 4 D a T) for some finite distributive 
lattice D and JSLs-morphisms g,h. A regular language L C * is nuclear if the transition 
morphisms 6, = a~'(—): SLD(L) —> SLD(ZL) (a € £) of its minimal JSL-dfa are nuclear. 

> Example 4.2 (BiRFSA languages). A regular language L is biRFSA [19] if (NL) = Nrs, 
that is, the canonical residual finite state automata for L and L" (see Section 2) are reverse- 
isomorphic. In [26, Example 5.7] we proved that the biRFSA languages are precisely those 
whose semilattice SLD(L) is distributive. Thus biRFSA languages are nuclear. 


There is a natural subclass of nuclear languages which need not be biRFSA: 
> Definition 4.3 (Lattice language). For any S € JSL¢ we define the language L(S) C X*, 

B= {(j]:7 € J(S)}U{lm):meM(S)} and  L(S):= () EU mE. 

Jgsm 

Then © is the disjoint union of J(S) and M(S) (with the notation (j| and |m) used to 
distinguish between elements of the two summands), and L(S) consists of all words over © 
not containing any factor (j||m) with j <s m. 
> Lemma 4.4. For any S € JSL¢, the language L(S) is nuclear and S = SLD(L(S)). 
Crucially, for nuclear and lattice languages some of the relations (2.6) hold with equality: 
> Proposition 4.5. 
1. If L is a nuclear language then ns(L) = dim(DR z). 
2. If L= L(S) is a lattice language then natm(L) = nsyn(L) = ns(L) = dim(DR_z). 
These equalities are the key fact making our reductions in the next section work. 
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B Complexity of Computing Small (Sub)Atomic Acceptors 


We are ready to present our main complexity results on small (sub)atomic nfas. First we 
consider the slightly simpler atomic case, phrased as the following decision problem: 


DFA + DFA‘ > NFAatm 
Input: Two dfas A and B such that L(A) = L(B)' and a natural number k. 
Task: Decide whether there exists a k-state atomic nfa equivalent to A, i.e. natm(L(A)) < k. 


> Remark 5.1. Taking mutually reverse dfas (A, B) as input permits an efficient computation 
of the dependency relation DR, C LD(L) x LD(L") of L = L(A). One may assume A and 
B are minimal dfas, so that their state sets Q4 and Qp are in bijective correspondence 
with LD(Z) and LD(L"). For p E€ Q4 choose some w,(p) € &* sending the initial state to p; 
analogously choose wg(q) € 5* for q E€ Qg. Then DR; is a bijective relabeling of 


DRL CQaAxQB where DR L(p,q) :<— A accepts wa(p)wa(q)', 


so it is computable in polynomial time from A and B. A completely analogous argument 
applies to the relations Z, DR LŁ,a and F from Example 3.6. 


> Theorem 5.2. The problem DFA + DFA" > NFA atm is NP-complete. 


We establish the upper and lower bound separately in the next two propositions. Both their 
proofs are based on the fundamental equivalence between JSLr and Dep. 


> Proposition 5.3. The problem DFA + DFA" > NFA atm is in NP. 


Proof. 


1. One can check in polynomial time whether a given pair (A, B) of dfas forms a valid 
input, i.e. satisfies L(A) = L(B)". In fact, this condition is equivalent to L(A) N L(B) = 
L(B) L(A)" = Ø. Using the standard methods for complementing dfas and reversing 
and intersecting nfas, one can construct nfas for L(A) N L(B)" and L(B)N L(A)" of size 
polynomial in |A| and |B|, the number of states of A and B, and check for emptyness by 


verifying that no final state is reachable from the initial states. 


2. Let A and B be dfas accepting the languages L and L", respectively, and let k be a 
natural number. We claim the following three statements to be equivalent: 
a. There exists an atomic nfa accepting L with at most k states. 
b. There exists a finite semilattice S with |J(S)| < k and JSL¢-morphisms p,q and Ta 
(a € £X) making the left diagram below commute. 
c. There exists a Dep-object S C Ss X Sı with |Ss| < k and |S,| < |B| and Dep-morphisms 
P, Q and Ta (a € £) making the right diagram below commute (cf. Example 3.6/3.7). 


5 D, 
BLD(L) —2+ BLD(L) iar ides 
of + + 1 1 ae as , 
g q1 ia NS Tee oN 
| 7 | | J | 
2 eel ee 2 id o eal id, (5.1) 


SLD(L) - - + SLD(L) DR, ——> DR; 
5 DR isa 


a 
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In fact, (a)<(b) was shown in Theorem 2.1(1), and (b)(c) follows from the equivalence 
between JSL¢ and Dep. To see this, note that in the left diagram we may assume q to 
be injective; otherwise, factorize q as q = q' o e’ with e surjective and q’ injective and 
work with q’ instead of q. By the self-duality of JSL¢, dualizing q yields a surjective 
morphism from BLD(L) = BLD(L)°? to S°. Thus, 


|M(S)| = |F(S°?)| < |F(BLD(Z))| = [2"/~z] = |LD(L‘)| < |B]. 


In the two last steps, we use that the congruence classes of ~z, correspond bijectively to 
left derivatives of L" by (2.4), and that LD(L") is the set of states of the minimal dfa for 
L". 

By Example 3.6 and 3.7 the upper and lower path of the left diagram in JSL¢ correspond 
under the equivalence functor Pirr to the upper and lower path of the right diagram in 
Dep. Therefore, Theorem 3.3 shows the two diagrams to be equivalent. 


3. From (a) (c) we deduce that the relations S, P, Q and Ta (a € X£) constitute a short 
certificate for the existence of an atomic nfa for L with at most k states. Commutativity 
of the right diagram can be checked in polynomial time because all the relations appearing 
in the upper and lower path can be efficiently computed from the given dfas A and B. 
Indeed, for the lower path we have already noted this in Remark 5.1, and the upper path 
emerges from the minimal dfa for L", using that 4*/~, S LD(L"). a 


> Remark 5.4. An alternative proof that DFA + DFA" > NFAzim is in NP uses the 
following characterization of atomic nfas. Given an nfa N, let rsc(N") denote the dfa 
obtained by determinizing the reverse nfa N" via the subset construction and restricting to 
its reachable part. Then N is atomic iff rsc( N") is a minimal dfa [4, Corollary 2]. Thus, given 
a pair (A, B) of mutually reversed dfas, to decide whether natm(L(A)) < k one may guess a 
k-state nfa N and verify that rsc( N") is a minimal dfa equivalent to B. One advantage of our 
above categorical argument is that it yields simple certificates in the form of Dep-morphisms 
subject to certain commutative diagrams, which amount to solutions of equations in Rel. 
The latter may be directly computed using a SAT solver, leading to a practical approach 
to finding small atomic acceptors (cf. [9]). To this effect, let us note that the proof of 
Proposition 5.3 actually shows how to construct small atomic nfas rather than just deciding 
their existence: every certificate S, P, Q, Ta (a € X) yields an atomic nfa with states Ss, 
transitions given by (Ja) C Ss x Ss for a € X, initial states (Z 3 P)_|x] C Ss and final 
states (Q 3 F’)~ [x] C Ss. (Recall that 3 denotes composition in Dep and (—)_ denotes the 
maximum lower witness of a Dep-morphism, see Remark 3.2.) In fact, this is precisely the 
nfa of join-irreducibles of the JSL-dfa (S,7,p 0%, f'o q) induced by the left diagram in (5.1). 
Analogous reasoning also applies to the computation of small subatomic nfas treated in 
Theorem 5.7 below. 


> Proposition 5.5. The problem DFA + DFA" > NFAgim is NP-hard. 


Proof. We devise a polynomial-time reduction from the NP-complete problem BICLIQUE 
COVER [8]: given a pair (R, k) of a relation R C Rs x Rt between finite sets and a natural 
number k, decide whether R has a biclique cover of size at most k, ie. dim(R) < k. 

For any (R,k), let S = Open(R) be the finite semilattice of open sets corresponding to 
the Dep-object R, cf. Theorem 3.3, and let L = L(S) be its lattice language. We claim that 
the desired reduction is given by 


(R,k) >  (dfa(L),dfa(L"), k), 
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where dfa(L) and dfa(Z") are the minimal dfas for L and L". Thus, we need to prove that 
(a) dim(R) = natm(L), and (b) the two dfas can be computed in polynomial time from R. 


Ad (a). We have the following sequence of Dep-isomorphisms: 


R ae TE Pirr(Open(R)) = Pirr( S) Ga Pirr(SLD(L(S))) = Pirr(SLD(L)) Cor a DR. 


Lemma 3.9(2) and Proposition 4.5 then imply dim(R) = dim(DR z) = natm(L). 


Ad (b). Let J(Open(R)) = {j1, . --, jn} and M(Open(R)) = {m1,..., Mp}. Then dfa(L) and 
dfa(L") are the automata depicted below, where L and L" are their respective initial states. 


|m) :méEM(S) (il: GE T(S) 
Glima us (ii Emp 
|m1) pee =e Imp} 
[my mp) 
Imp) 
Im1)TtL" a. Impi tE 
|m1) 
ae a 
xa 

Both automata can be computed in polynomial time from R using Lemma 3.11. < 


Next, we turn to the computation of small subatomic nfas. While in the atomic case the 
input language was specified by a pair of dfas, we now assume an algebraic representation: 


> Definition 5.6. A monoid recognizer is a triple (M,h, F) of a finite monoid M, a map 
h: % — M and a subset F C M. The language recognized by (M,h, F) is given by 
L(M,h, f) = h (Fl, where h: 4* — M is the unique extension of h to a monoid morphism. 


It is well-known [27] that a language L is regular iff it has a monoid recognizer. In this case, 
a minimal monoid recognizer for L is given by (syn(L), uz, Fr) where pr: © —> syn(L) is 
the domain restriction of the syntactic morphism and Fz = {[w]=, : w € L}. It satisfies 
|syn(L)| < |M| for every recognizer (M, h, F) of L. Consider the following decision problem: 


MON —> NFAgyn 
Input: A monoid recognizer (M,h, F) and a natural number k. 
Task: Decide whether there exists a k-state subatomic nfa accepting L(M, h, F). 


Here we assume that the monoid M is explicitly given by its multiplication table. 
> Theorem 5.7. The problem MON —> NFAgyy is NP-complete. 


Proof sketch. The proof is conceptually similar to the one of Theorem 5.2. To show the 
problem to be in NP, one uses the algebraic characterization of nsyn( L) in Theorem 2.1(2) 
and translates the ensuing JSL¢-diagram into Dep. To show NP-hardness, one reduces from 
BICLIQUE COVER via 


(R, k) => ((syn(L), uz, Fr), k), 
where again L = L(Open(R)). < 


Our complexity results indicate a trade-off, i.e. computing small subatomic nfas requires 
a less succinct representation of the input language. Generally, |dfa(Z)|, |dfa(L")| < |syn(L)| 
and the syntactic monoid can be far larger — even for nuclear languages. 
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> Example 5.8. For any natural number n consider the dfa A, = ({0,...,2—1},6,1, {1}) 
over the alphabet © = {2,7} with 6,(4) =i+1 mod n for i = 0,---n— 1, and 6,(0) = 1, 
6,(1) = 0, ô (i) = i otherwise. Let Ln = L(A,,) denote its accepted language. Then: 

1. Both A, and its reverse nfa are minimal dfas; in particular, |dfa(L,)| = |dfa(Lf,)| = n. 

2. We have |syn(Ln)| = n!. To see this, recall that syn(Z,,) is the transition monoid of 
An = dfa(L,,). It is generated by the n-cycle ôr = (01 --- n — 1) and the transposition 
6, = (0 1); then it equals the symmetric group Sn on n letters. 

3. By part (1) the language Ln is bideterministic [30], i.e. accepted by a dfa whose reverse 
nfa is deterministic. This implies that the left derivatives of L» are pairwise disjoint, so 
SLD(ZL,,) is a boolean algebra. In particular, Ln is a nuclear language. 

We finally further justify the inputs of DFA + DFA’ > NFA,zim and MON > NFAgyn: 

the two modified problems DFA > NFAgtm and DFA —> NFAgyn where only a (single) 

dfa is given are computationally much harder. 


> Theorem 5.9. DFA > NFA,tm and DFA —> NFA,syn are PSPACE-complete. 


Proof. This follows by inspecting Jiang and Ravikumar’s [15] argument that DFA —> NFA 
is PSPACE-complete. These authors give a polynomial-time reduction from the PSPACE- 
complete problem UNIVERSALITY OF MULTIPLE DFAS, which asks whether a 
given list Aj,...,An of dfas over the same alphabet X satisfies |J; L(Ai) = &*. For any 
Aj,.-.-,An they construct a dfa A over some alphabet T and a natural number k such that: 
1. If U; L(Ai) 4 =*, then every nfa accepting L(A) requires at least k + 1 states. 

2. If U; L(Ai) = &*, then there exists an nfa accepting L(A) with k states. 

In the proof of (2), an explicit k-state nfa N = (Q, ô, {go}, F) with L(N) = L(A) is given, 
see [15, Fig. 1]. It has the property that, after ¢-elimination, for every state q there 
exists w € I* with ôw[qo] = {q}. This implies that every state q accepts a left derivative 
w 'L(N), ie. N is a residual nfa [7]. In particular, N is both atomic and subatomic. 
Consequently, (Ai,..., An) > (A, k) is also a reduction to both DFA > NFAztm and 
DFA > NFAgyn. < 


l6 Applications 


We conclude this paper by outlining some useful consequences of our NP-completeness results 
concerning the computation of small nfas for specific classes of regular languages. 
6.1 Nuclear Languages 


As shown above, nuclear languages form a natural common generalization of bideterministic, 
biRFSA, and lattice languages. Let DFA +DFA' > NFA be the variant of DFA + DFA‘ > 
NFAatm where the target nfas are arbitrary, i.e. the task is to decide ns(L(A)) < k. Then: 


> Theorem 6.1. For nuclear languages, the problem DFA + DFA" > NFA is NP-complete. 


In fact, by Proposition 4.5(1) we have ns(L) = dim(DR z) for nuclear languages, so NP 
certificates are given by biclique covers. The NP-hardness proof is identical to the one of 
Theorem 5.2: the reduction involves a lattice language, which is nuclear by Lemma 4.4. 
6.2 Unary languages 


For unary regular languages L C {a}*, every two-sided derivative (a‘)~!L(a/)~' is equal to 
the left derivative (a't’)~'L. Therefore, we have natm(L) = nsyn(L) and the minimal dfa 


for L is the dfa structure of the syntactic monoid. From Theorem 5.7 we thus derive 
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> Theorem 6.2. For unary languages, the problem DFA —> NFAgyn is in NP. 


This theorem generalizes the best-known complexity result for unary nfas, which asserts 
that the problem DFA —> NFA is in NP for unary cyclic languages [13], i.e. unary regular 
languages whose minimal dfa is a cycle. In fact, for any such language L we have shown in [26, 
Example 5.1] that nsyn(L) = ns(L), hence DFA — NFA coincides with DFA > NFAgyn. 


6.3 Group languages 


A regular language is called a group language if its syntactic monoid forms a group. Several 
equivalent characterizations of group languages are known; for instance, they are precisely 
the languages accepted by measure-once quantum finite automata [3]. Concerning their 
state-minimal (sub)atomic acceptors, we have the following result: 


> Proposition 6.3. For any group language L, we have nsyn(L) = natm(L). 
Therefore, Theorem 5.2 implies 
> Theorem 6.4. For group languages, DFA + DFA" > NFAgyn is in NP. 


The complexity of the general DFA + DFA" > NFAsyn problem is left as an open problem. 


[7 Conclusion and Future Work 


Approaching from an algebraic and category-theoretic angle we have studied the complexity 
of computing small (sub)atomic nondeterministic machines. We proved this to be much 
more tractable than the general case, viz. NP-complete as opposed to PSPACE-complete, 
provided that one works with a representation of the input language by a pair of dfas or a 
finite monoid, respectively. There are several interesting directions for future work. 

The particular form of our main two NP-complete problems suggests an investigation of 
their variants DFA + DFA‘ —> NFA and MON —> NEA computing unrestricted nfas. The 
reductions used in the proof of Theorem 5.2 and 5.7 show both problems to be NP-hard, and 
we have seen in Theorem 6.1 that they are in NP for nuclear languages. The complexity of 
the general case is left as an open problem. 

The classical algorithm for state minimization of nfas is the Kameda-Weiner method [18], 
recently given a fresh perspective based on atoms of regular languages [29]. The algorithm 
involves an enumeration of biclique covers of the dependency relation DRz. Since our base 
equivalence JSL¢ ~ Dep reveals a close relationship between biclique covers and semilattice 
morphisms (e.g. Lemma 3.9), we envision a purely algebraic account of the Kameda-Weiner 
method. We should also compare our canonical machines to the Universal Automaton [20], 
a language-theoretic presentation of the Kameda-Weiner algorithm. For example, our 
morphisms preserve the language whereas the Universal Automaton uses simulations. 

Finally, the classes of nuclear and lattice languages — introduced as technical tools for our 
NP-completeness proofs — deserve to be studied in their own right. For instance, we expect 
to uncover connections between lattice languages and the characterization of finite simple 
non-unital semirings which are not rings [31, Theorem 1.7]. 
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